Automatically Augmenting Terminological Lexicons from Untagged Text
نویسندگان
چکیده
Lexical resources play a crucial role in language technology but lexical acquisition can often be a time-consuming, laborious and costly exercise. In this paper, we describe a method for the automatic acquisition of technical terminology from domain restricted texts without the need for sophisticated natural language processing tools, such as taggers or parsers, or text corpora annotated with labelled cases. The method is based on the idea of using prior or seed knowledge in order to discover co-occurrence patterns for the terms in the texts. A bootstrapping algorithm has been developed that identifies patterns and new terms in an iterative manner. Experiments with scientific journal abstracts in the biology domain indicate an accuracy rate for the extracted terms ranging from 58% to 71%. The new terms have been found useful for improving the coverage of a system used for terminology identification tasks in the biology domain.
منابع مشابه
Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text
A method is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from aligned bilingual text. The proposed method only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences.
متن کاملSyntactic Parsing as a Step for Automatically Augmenting Semantic Lexicons
This paper investigates how, and to what extent the flexibility and robustness of a partial parser can be utilized to automatically extend existing semantic lexicons. Our work is based on the observation that members of a semantic group are often surrounded by other members of the same group in text. Given a few category members we collect surrounding contexts and try to identify other words th...
متن کاملLexical Semantic Resources in a Terminological Network
A research has been carried on and is still in progress aimed at the construction of three specialized lexicons organized as databases of relational type. The three databases contain terms belonging to the specialized knowledge fields of maritime terminology (technicalnautical and maritime transport domain), taxation law, and labour law with union labour rules, respectively. The EuroWordNet/Ita...
متن کاملTowards a Standardized Linguistic Annotation of the Textual Content of Labels in Knowledge Representation Systems
We propose applying standardized linguistic annotation to terms included in labels of knowledge representation schemes (taxonomies or ontologies), hypothesizing that this would help improving ontology-based semantic annotation of texts. We share the view that currently used methods for including lexical and terminological information in such hierarchical networks of concepts are not satisfactor...
متن کاملAutomatically Generating Extraction Patterns from Untagged Text
Many corpus-based natural language processing systems rely on text corpora that have been manually annotated with syntactic or semantic tags. In particular, all previous dictionary construction systems for information extraction have used an annotated training corpus or some form of annotated input. We have developed a system called AutoSlog-TS that creates dictionaries of extraction patterns u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000